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Abstract 

Once a program file is modified, the recompilation time should be 
minimized, without sacrificing execution speed or high level object 
oriented features. The recompilation time is often a problem for the 
large graphical interactive distributed applications tackled by modern 
00 languages. A compilation server and fast code generator were 
developed and integrated with the SRC Modula-3 compiler and Linux 
ELF dynamic linker. The resulting compilation and recompilation 
speedups are impressive. The impact of different language features, 
processor speed, and application size are discussed. 

Keywords: Compiler, Code generator. Dependency analysis. Persistent 
cache, Smart recompilation 

1 Introduction 

Recompilation speed is only one ingredient in the global picture of program- 
ming productivity. Nonetheless, its impact on programmer satisfaction is not 
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to be underestimated. Interpreters are sometimes seen as the solution for zero 
recompilation time. However, the modified module needs to be parsed into 
bytecode, which is not too different from machine code, especially if the link 
editing phase is partially obviated by dynamic linking. 

The recompilation may be divided into the following phases. The smart 
recompilation phase determines the modified files and computes the minimal 
set of files to recompile. The parsing and code generation phases convert the 
source code files into relocatable binary files. The prelink phase computes 
package wide information such as initialization order. Finally, the link phase 
combines all the relocatable binary files into a library or executable program. 

These phases are detailed in section 2 along with a discussion of the 
impact of different language features on their complexity. This section mo- 
tivates the two main extensions brought by the authors to the DEC SRC 
Modula-3 compiler pp, a compilation server and fast code generator, and 
reviews related work. 

Section 3 details the compilation server while section 4 describes the fast 
integrated code generator for Linux ELF j2]. Section 5 presents the results 
obtained with the enhanced compiler, and outlines the contribution of each 
extension as well as the sensitivity to different parameters. In the conclusion, 
the applicability of these results to other languages such as Java [3] and C++ 
jl] are discussed, and avenues for further development are examined. 

2 Background 

The different phases involved in recompiling a package, program or library, 
are detailed in this section. The work performed by typical compilers and by 
the DEC SRC Modula-3 compiler, possible enhancements, and related work 
are discussed at key steps. 

The modification time of files comprising a package are checked to deter- 
mine which ones were modified since the last compilation. These files always 
need to be recompiled. In an integrated development environment, this in- 
formation may be provided by the editor, if all modifications are performed 
through it. 

If any type checking, or data structure member offset computation, is 
performed at compile time, files have dependencies upon imported files con- 
taining declarations. This is not the case for languages such as Smalltalk j3] 
or LISP, which defer type checking to runtime, but applies to Java, C++ 
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and Modula-3. When one of the declarations used by a file is changed, that 
file needs to be recompiled too. 

The use of a text level macro preprocessor, for importing declarations 
from other files, makes such dependency analysis extremely difficult in C 
and C++. The traditional approach embodied by makefiles is to recompile 
a file whenever an included file was modified. The DEC SRC Modula-3 
compiler remembers the fingerprint of each declaration used by a module, 
and recompiles the module only when any of the declarations used has a 
different fingerprint. This much finer grain dependency analysis, (at the 
level of individual type declaration instead of complete header file), explains 
in large part the smaller recompilation times typically associated with more 
structured languages, as compared to C/C++. 

Computing the minimal set of files needing recompilation is the smart 
recompilation phase and has been studied in 0, [7j, jH], and jH]. Any reduc- 
tion in the number of files to recompile, due to a finer analysis, has a direct 
impact on the total recompilation time. 

The set of modified files, and of files potentially affected by modifications 
in files on which they depend, is recompiled in the parsing and code genera- 
tion phase. Compiling the individual files is often the most time consuming 
step. While minimizing the number of files to recompile is important, it is 
also possible to reduce the number of imported interfaces to parse, and to 
speedup the code generation phase. 

The DEC SRC Modula-3 compiler already has a cache for imported inter- 
face files. Each imported interface is read at most once in each recompilation. 
A first extension, described in section 3, was to convert the compiler into a 
compilation server. This way, an imported file may be kept in the interface 
cache and need not be read and reparsed if it did not change between two 
compilations. 

Koehler and Horspool ^U] worked on a compilation server for the C lan- 
guage. Most of their effort was spent analyzing the pre-processor context, to 
determine if the imported file can be reused in different importers. It requires 
a validation scheme rather different than the one proposed here. Onodera 
proposed ^1] a compilation server for a different language, COB, which has 
2 kinds of interfaces, one similar to C and one similar to Modula-3. 

The code generation time is a significant fraction of the total compilation 
time. A fast code generator for Linux ELF, based on an existing code gen- 
erator for NT in SRC Modula-3, is a second extension described in section 
4. Tanenbaum et al. ^2]; and Eraser et al. [13j have obtained interesting 
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results with fast but less flexible non optimizing backends. 

In the prelink phase, a number of application wide informations may 
need to be computed, to be used by the run-time system. Non 00 languages 
like C had little or none such information. In Modula-3, the initialization 
order of modules (based on the modules dependency graph), the run time 
type information (after resolving type equivalence), the coherence of opaque 
types revelations, and the structures for checking inheritance relationships in 
constant time, are computed during this prelink phase. In Java, initialization 
order is determined at runtime. C+-|- does not specify an initialization order 
and only recently started to offer run time type identification. 

The final step, performed by the link editor, is assembling the object code 
of all the modules into the final executable. With dynamic linking, such 
as in Linux ELF [2], the amount of processing required is greatly reduced. 
All references internal to a module use position independent code and do 
not require further link editing. Moreover, links to libraries are resolved 
at execution time. Data symbols in libraries are resolved upon loading the 
executable but procedure references are resolved only when needed, as the 
execution proceeds. 

3 Compilation Server 

A compilation server reduces the recompilation time by maintaining some 
information across compilations, instead of reading and reparsing the cor- 
responding files each time. Imported interfaces represent the bulk of the 
information required when compiling a file and often do not change from one 
compilation to another. Thus, the purpose of the compilation server is to 
maintain the interface cache across compilations. 

Implementing a compilation server for C/C++ presents serious difficulties 
because of the preprocessor mechanism. For instance, the #ifdef in Figure 
^can lead to two different variables in the symbol table, depending on the 
value of DEBUG. Moreover, the compilation of an included .h file does not 
lead to an independent object file. Instead, the object code associated with 
it is part of the relocatable object files produced by the compilation of "c" 
files that include the ".h" file. 

In languages with explicit interfaces such as Modula-3, the content of an 
interface is not dependent on the importing file and can be reused for several 
importing files, even across compilations. Some languages such as Eiffel and 
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#ifdef 
int X 
#else 
int y 

#endif 



DEBUG 



Figure 1: #ifdef example 




library 1 



library 2 



new program 
or library 



Figure 2: Dependencies among imported Modula-3 interfaces 



Java extract the interface from the program file. When the program file is 
recompiled, the interface is extracted. It could be stored in the interface 
cache of a compilation server in the same way. 

Figure El shows the dependencies between interfaces A to E used by a 
program P. An arrow shows that a module or an interface imports another 
interface. This acyclic directed graph imposes a compilation order. The in- 
terfaces at the leaves are compiled first. Each interface is compiled seperately, 
i.e. it has its own associated relocatable object file. However, an interface 
would normally need to be parsed again whenever it is imported. This is the 
costly part of the compilation avoided with the interface cache. The parsing 
result, the abstract syntax tree [AST), is stored in the interface cache. 

Between compilations, some of the interfaces in the cache may become 
invalid. Indeed, if the associated source file has been modified or if a directly 
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or indirectly imported interface is invalid, the interface is declared invalid. In 
the last case, the fine grain dependency analysis is used to determine if any 
declaration actually used by the interface has changed. If not, the interface 
is still valid. Otherwise, the interface is removed from the cache and will 
need to be read and parsed again. 

Thus, the validation algorithm is to recursively visit each imported in- 
terface in the graph to see if its associated source file has been modified or 
if imported declarations have changed. A time stamp is used to mark each 
valid interface as valid for this recompilation. When the same interface ap- 
pears in the import graph of another file, the time stamp indicates that it 
has already been checked as valid. The existing implementation of DEC SRC 
Modula-3 did not have a validation phase since the interface cache was not 
kept across compilations. Any interface in the cache was necessarily parsed 
in the current compilation and therefore still valid. 

A further optimization is added for interfaces imported from separate li- 
braries {packages) . Indeed, each package has an associated file containing the 
information used by the smart recompilation system. Any modification to an 
interface in a library, and subsequent recompilation, changes the modification 
time of the associated file. Thus, the modification time of library interfaces is 
checked only if the associated file has changed. Many libraries being seldom 
changed, this simple optimization avoids a large number of file modification 
time checks. The generic interfaces (the equivalent of the templates in C++) 
are not put in the cache because they don't have a corresponding AST. Their 
instanciations, however, are eligible to be cached. 

Figures El and |3] show the general structure of the compiler before and 
after being transformed into a server. Libraries are represented as ovals 
and programs as rectangles. Import relationships are indicated by arrows. 
Programs executing other programs are shown with dashed lines. 

Packages mSfront and mSback are the frontend and the backend respec- 
tively. MSlinker is the smart recompilation system. Package mS contains 
the main procedure of the compiler. MSquake is a simple interpreter that 
parses the m3 makefiles and passes parameters to the compiler (mS). MSbuild 
initiates the mSquake interpreter in the appropriate directory, inserts the 
platform and package dependent definitions, and sends the mSmakefile to 
mSquake. The connections to the two available code generators, and to the 
linker, were left out for simplicity. 

The existing compiler involved several processes which were merged in a 
single executable program calling upon several libraries. Packages mS and 
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mSbuild 



mSquake 



m3 



mSlinker 



"driver" 




mSback 



mSmiddle 



Figure 3: Structure of the SRC Modula-3 compiler 



mSquake were converted to libraries. MSbuild acted as the remaining pro- 
gram and, instead of exiting after a recompilation, awaited further commands 
from clients using network objects as shown in Figure El 

Each client request consists in a package name and location, build options 
if any, and a network output stream to receive the error messages. The same 
compilation server can process requests for different packages. Simultaneous 
recompilation requests are serialized. 

A small program, mSclient, acts as a client that passes compilation re- 
quests to the server. It replaces the compilation command normally entered 
from the command line or through the editor menus. 

There is currently no mechanism or strategy to remove interfaces from the 
cache (e.g. least recently used). This is not a problem for a few users working 
on a few packages. However, if the compilation server executes for weeks and 
new packages are constantly being added, the memory growth is likely to 
become a problem. Deciding on an efficient strategy may require some study 
but would be a minor implementation effort. The problem of multi-user 
access control has not been addressed either. Sharing a compilation server 
would bring interesting benefits only in very specific environments. 

An alternative to maintaining the parsed interfaces in memory in a server 
process across compilations, is to store on disk pre-parsed versions of the 
interfaces. This would be faster than having no compilation server, if writing 
these pre-parsed interfaces is much quicker than the time saved by not having 
to re-parse the interfaces. However, this is slower than keeping the parsed 
interfaces already in memory, provided that enough memory is available. 
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mSclient 







mSserver 


K 



mSquake 




Figure 4: Structure of the modified SRC Modula-3 compiler 

4 Fast Code Generation 

Portable, retargetable, optimizing compilers such as gcc are built in several 
layers and construct abstract syntax trees, an intermediate language repre- 
sentation, and an assembly language file at different stages during the com- 
pilation. Furthermore, the interface between the Modula-3 frontend, written 
in Modula-3, and the gcc based backend introduces another intermediate 
representation. 

While the gcc based backend is retained to benefit from its optimizing 
capabilities and wide range of supported platforms, an integrated code gen- 
erator may be used on some platforms for faster compilation during the edit 
compile debug cycle. The code generator is fed by the frontend with simple 
virtual machine instructions and generates object code directly. It cannot 
perform sophisticated optimizations or target multiple platforms. 

An existing code generator for NT under Intel 386 was used as a base. 
It was modified to support Linux ELF object files 0, and to produce po- 
sition independent code and debugging information. Position independent 
code allows efficient dynamic linking, another important ingredient for fast 
recompilation. Debugging information generation increases slightly the code 
generation time but is almost essential for adequately supporting the edit 
compile debug cycle. 

Figure ini fly' shows a simple Modula-3 procedure returning the sum of two 
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INTERFACE MSServer; 

IMPORT NetObj , Thread, Wr, Pathname, TextList; 

EXCEPTION 

Error (TEXT); 

TYPE 

T = NetObj. T OBJECT METHODS 

compile (init_dir : Pathname . T; options : TextList .T; writer :Wr .T) 
RAISES NetObj .Error, Thread. Alerted, Error; 
END; 
END MSServer. 

Figure 5: Network object interface to the compilation server 

values received as arguments. In b), the sequence of methods calls issued 
by the Modula-3 frontend, and implemented by the code generator object, 
is illustrated. The methods supported by the code generation object are 
sometimes called the intermediate language. The arguments for the methods 
were omitted in the example. In c), the machine code produced by the 
backend is presented in the AT&T assembly language format for Intel 386 
Architecture. 

At the beginning of a procedure, the method begiri-procedure is called 
by the frontend, and the 6 first machine instructions are generated. This 
saves on the stack and initializes the registers used for setting up a new 
frame according to the calling convention. Figure [7| shows the structure of 
the stack frame constructed with these instructions. The ebp register (base 
register) is used to reference other values in the stack frame by specifying 
a 4 byte offset in an indexed address mode. The stack frame also includes 
the arguments to the procedure, the return address of the calling procedure, 
the value of ebp for the previous stack frame, local variables, and some saved 
register values. 

Thereafter, the machine instructions corresponding to the body of the 
Modula-3 procedure are generated. The load method is called twice by the 
frontend, once for each operand. No machine code is generated for this. In- 
stead, the operands are pushed on the operand stack in the data structures 
maintained by the code generator. The call of the method add by the fron- 
tend generates the code for adding the operands previously pushed on the 
stack. On the 1386, at least one operand of add must be in a register. Thus, 
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PROCEDURE Add(i, j 
BEGIN 

RETURN i + j; 
END Add; 



INTEGER) : INTEGER = 



begin_procedure ( . 



loadC . .) 
load(. . .) 
add(. . .) 



(a) 

.y- 



exit_proc ( . . . ) 

end_procedure (...)- 



pushl °/oebp 
movl y„esp,°/oebp 
pushl %ebx 
pushl "/oGsi 
pushl %edi 

movl OxSCZebp) //.edx 

addl Oxc(yoebp) //oGdx 

movl "/oedx/Zoeax 

popl '/oodi 

popl Zesi 

popl y„ebx 

leave 
ret 



(b) 



(c) 



Figure 6: Compilation of a simple procedure. 



one variable must be moved from its position in the stack frame in memory to 
a register (here edx). The other operand can be addressed directly in mem- 
ory by instruction add. The result of the addition must be returned to the 
calling procedure in register eax, as specified in the calling convention. Thus, 
the value in edx is moved to eax when method exit-procedure is called by the 
frontend. The call to method end-procedure generates the code required to 
remove the stack frame and restore affected registers. 

Several optimizations to the generated code would be possible. For exam- 
ple, the use of register edx could be avoided by using eax instead. This would 
obviate the need to transfer the return value to eax in the end. Registers 
ehx, esi and edi were needlessly saved and restored on the stack even if they 
were not used during the body of the procedure. Even though the fast code 
generator does not perform such optimizations, the generated code is still 
more efficient than the naive code produced by gcc without optimization, as 
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C(%ebp) 
8(%ebp) 
4(%ebp) 
0(%ebp) 
-4(%ebp) 
-8(%ebp) 
-C(%ebp) 
-10(%ebp) 



Figure 7: Stack frame 

seen in section 5. 

The code generator structure is designed around four major objects, as 
shown in Figure |H1 The main object is M3x86 which implements code gener- 
ation for procedure calls and returns, and produces the global variables sec- 
tion. It also coordinates the code generation performed by the other objects. 
Stackx86 implements the operand stack, performs the register allocation and 
generates the code to move operands between the stack and the registers. 
Detailed machine instructions layouts, including addressing modes, are ob- 
tained through the Codex86 object. Everything is formatted into an ELF 
binary file, with debugging information, by MSObjFile. The relationships 
between these objects are represented by arrows in Figure |H1 

5 Results 

The test set used for the performance evaluation is described in Table ^ 
It contains programs and libraries of different sizes, and importing different 
types of libraries. Large graphical applications such as webscape, postcard, 
and ps2html tend to use large libraries, and involve numerous imported in- 
terfaces. Webvbt is one of the libraries used by webscape, and columns is a 
smaller graphical application. Netobjd is a small program involving network 
objects, and mSbrowser is a module/type browser with a web server inter- 
face. MStohtml is a small program converting Modula-3 modules to html, 
and mSfront is a library implementing the Modula-3 compiler frontend. 

For each package are shown the source code size, number of lines, number 
of non blank lines, number of directly and indirectly imported interfaces, and 



argument 1 (j) 



argument 



return address 



previous ebp 



temporary variable 



ebx 



esi 



edi 



high addresses 



low addresses 
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MSfront 



modified GCC 



M3BackWindows 



M3x86 -procedure management 
(call, ret) 
-management of variables and 
the initialized data 

Stackx86 -operand stack management 
-register allocation 



Codex86 -address mode of the 
machine instructions 



MSObjFile -binary file production 
-debugger support 



MSBackLinux 



Figure 8: Structure of the code generator. 



the memory consumed by the interface cache to represent the AST. All tests 
were performed on a 75MHz Intel Pentium with 32MB of main memory and 
running Linux 2.0. All times are in elapsed seconds on a single user machine, 
and thus account for input/output delays. 

The performance of both the integrated backend and the compilation 
server were evaluated and compared to the original DEC SRC Modula-3 
compiler version 3.6. The compilation times for the original compiler are in 
Table El 

Detailed measures were obtained for the various compilation phases using 
the compiler built-in timers. Column M3^I.L is the time required to trans- 
late Modula-3 code into intermediate language, which is performed by the 
frontend. I.L.^ass. is the time needed by the gcc based backend (m3cgcl) 
to convert the intermediate language into assembly, and ass.^reloc. is the 
time taken by the assembler to produce a relocatable binary file. 

Table |S1 presents the results for the compiler with the integrated backend 
instead of the gcc based backend. The fast code generator goes directly from 
the code generating method calls to the relocatable binary file, as indicated 
by M3=^reloc. Since the code generation is performed at the same time as 
parsing, the time required for each phase is not available separately. 
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package 
compiled 


size 


lines 


lines* 


interf. 


modules 


imported 

interfaces 


memory 

required 


columns 


47.17K 


1553 


1306 


6 


7 


30 


2.26M 


netobjd 


3.87K 


151 


128 


1 


2 


27 


1.12M 


webscape 


11.72K 


374 


347 





1 


69 


2.96M 


mSbrowser 


210.58K 


7005 


6312 


12 


13 


67 


1.93M 


wel)vl)t 


191.7(3K 


(325 1 


5361 


21 


20 


120 


3.63M 


mStohtml 


83.08K 


3035 


2656 


8 


9 


35 


1.26M 


mSfront 


1.367M 


45827 


39789 


175 


171 


38 


3.06M 


postcard 


341. 81K 


10418 


9575 


12 


11 


161 


3.93M 


ps2html 


315.85K 


13468 


9197 


30 


30 


111 


3.57M 



Table 1: Packages used to evaluate the performance 



package 
compiled 


Tinio (iu seconds) 


smart 
recomp. 


M3 
I.L. 


I.L. ^ 

ass. 


ass. ^ 
reloc. 


linking 


other 


total 


columns 


1.19 


4.21 


8.65 


4.44 


0.85 


0.32 


19.66 


netobjd 


0.78 


0.87 


1.50 


0.85 


0.43 


0.32 


4.75 


webscape 


3.48 


2.56 


4.06 


1.51 


1.50 


0.91 


14.02 


mSbrowser 


1.04 


10.08 


36.09 


11.88 


1.11 


0.81 


61.01 


webvbt 


4.80 


14.17 


37.52 


16.83 


0.97 


2.03 


76.32 


m3tolitml 


0.79 


4.00 


16.49 


6.39 


0.85 


0.61 


29.13 


m3 front 


1.62 


69,56 


220.36 


108.03 


5.84 


17.32 


422.73 


postcard 


2.79 


22.38 


55.34 


16.43 


4.16 


0.74 


101.84 


ps2html 


3.43 


28.24 


84.38 


27.34 


6.02 


1.48 


150.89 



Table 2: Compilation with the gcc based code generator 
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Time 


(in seconds) 






smart 


M3 =^ 








compiled 


recomp. 


reloc. 


linking 


other 


total 


columns 


1.16 


6.46 


0.85 


0.61 


9.08 


netobjd 


0.77 


1.24 


0.43 


0.39 


2.83 


webscape 


3.98 


2.93 


1.69 


1.14 


9.74 


mSbrowser 


0.97 


14.91 


0.85 


0.46 


17.19 


webvbt 


4.52 


21.60 


1.01 


2.02 


29.15 


mStohtml 


0.94 


5.83 


1.06 


0.39 


8.22 


mSfront 


2.45 


95.54 


8.05 


16.88 


122.92 


postcard 


3.07 


31.75 


8.51 


1.09 


44.42 


ps2html 


3.62 


35.73 


4.77 


1.28 


45.40 



Table 3: Compilation with the integrated backend 

The compilation time reduction is significant for all packages. When the 
compilation time is larger than 20 seconds with the gcc based backend, the 
total compilation time may be cut in half with the integrated backend. Other 
tests [T3] demonstrate that the production of position independent code does 
not significantly affect the compilation time. However, the generation of 
debugging information increases the total compilation time by 10-30%. 

To evaluate the quality of the generated code, a program was compiled 
with the gcc based backend without optimization and with full optimization 
(-02), and with the fast code generator. The execution time for the version 
compiled with the fast code generator was 6% faster than without optimiza- 
tion and 9% slower than with optimization. The memory footprint of the 
program compiled with the fast code generator is 18% smaller than without 
optimization and 14% larger than with optimization. 

These results are consistent with those reported by Tanenbaum and al. 
[121 , where they obtained a speedup between 2 and 3 by using a simplified 
backend. Their backend however still retained the ability to interface to 
different frontends and targets. 

The compilation server was evaluated both under a full recompilation and 
in a typical situation where only a few files were modified. For the first case, 
the executable and all object files were removed before recompiling. This is 
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Da,rka,£fp 
compiled 


Time (in seconds) 


smart 
recomp. 


M3 
I.L. 


I.L. =^ 

ass. 


ass. =^ 
reloc. 


linking 


other 


total 


columns 


0.63 


1.96 


9.04 


4.45 


0.86 


0.26 


17.20 


netobjd 


0.46 


0.28 


1.52 


1.09 


0.44 


0.25 


4.04 


webscape 


3.00 


1.58 


4.06 


1.55 


1.71 


0.65 


12.55 


mSbrowser 


0.63 


8.90 


37.19 


12.24 


1.55 


0.40 


60.91 


webvbt 


4.46 


12.04 


39.80 


17.41 


1.29 


2.31 


77.31 


mStohtml 


0.41 


3.29 


16.40 


6.90 


0.86 


0.37 


28.23 


mSfront 


1.39 


77.84 


235.90 


121.02 


34.70 


31.18 


502.03 


postcard 


2.48 


21.67 


57.73 


18.35 


7.32 


0.72 


108.27 


ps2html 


3.12 


24.73 


89.48 


30.44 


11.54 


1.59 


160.90 



Table 4: Full recompilation of the packages with AST's in the cache 
shown in Table EJ 

The savings brought by maintaining the interface cache across two compi- 
lations are smaller than originally anticipated. Only small gains are obtained 
for the small packages. Since the SRC Modula-3 compiler already contains 
an interface cache, the time to parse the interfaces is relatively small as com- 
pared to the code generation time, and a small fraction of the time is saved 
by caching the interfaces across compilations. 

More disturbing is the slight degradation obtained for some of the larger 
packages. Removing the network objects communication between the client 
and server did not change the results. The explanation lies in the increased 
memory footprint of the compilation server. With 32 MB, there is some com- 
petition for physical memory between buffered files (e.g. libraries, object files 
and the generated executable), and executing programs (e.g. the compiler 
and linker). Indeed, when the same tests were repeated on a computer with 
twice as much physical memory, a slight improvement was measured instead 
of a degradation. 

In a full recompilation, the existing interface cache brings most of the 
benefits, and the code generation phase largely dominates because all mod- 
ules need to be recompiled. Therefore, the savings brought by the server are 
minor as compared to the total compilation time and may even turn into 
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rTunTii Ipy 

used 


Time (in seconds) 


smart 
recomp. 


M3 ^ 
I.L. 


I.L. 

ass. 


ass. =^ 
reloc. 


linking 


other 


total 


standard 
with mScgcl 


0.73 


0.97 


2.73 


0.86 


0.42 


0.22 


5.93 


server 

with mScgcl 


0.76 


0.62 


2.73 


0.84 


0.42 


0.15 


5.52 


standard 

with integrated 


0.74 


1.84 


0.42 


0.24 


3.24 


server 

with integrated 


0.64 


1.36 


0.42 


0.20 


2.62 



Table 5: Recompilation of postcard after modifications to 2 files 

a degradation if memory is short. The picture improves however when the 
compilation server is combined with the fast code generator. Indeed, the 
2.46s reduction for columns represents 12.5% of 19.66s but amount to 27.1% 
of 9.08s. These savings only involve the smart recompilation and frontend 
parsing phase, and are independent of the code generation time. 

The tests presented in Tables El and IHl are more typical of the edit compile 
debug cycle. They consist in recompiling a large graphical application, post- 
card, after 2 files, and 4 files, are modified. Four different cases are covered: 
existing compiler and code generator, fast code generator, compilation server, 
compilation server with fast code generator. For these tests, the computer 
used had the same characteristics except a Pentium Pro 180MHz processor 
and 64MB of RAM. 

These tests clearly demonstrate that the compilation server and fast 
code generator contribute independently to the recompilation time reduc- 
tion. Their combined effect in that case amounts to a reduction from 9.77s 
to 3.22s. This is especially impressive considering that the Modula-3 compiler 
is already much more efficient than most C/C++ compilers, because of the 
interface cache and fine grain dependency analysis allowed by the structured 
interface mechanism. 
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pnTTiFii Ipt 

used 


Time (in seconds) 


smart 
recomp. 


M3 
I.L. 


I.L. 

ass. 


ass. =^ 
reloc. 


linking 


other 


total 


standard 
with mScgcl 


0.75 


1.89 


4.83 


1.68 


0.42 


0.20 


9.77 


server 

with mScgcl 


0.44 


1.09 


4.85 


1.68 


0.43 


0.17 


8.66 


standard 

with integrated 


0.73 


2.66 


0.43 


0.27 


4.09 


server 

with integrated 


0.51 


2.05 


0.42 


0.24 


3.22 



Table 6: Recompilation of postcard after modifications to 4 files 

6 Conclusion 

A fast recompilation system for a modular, compiled, object oriented lan- 
guage was presented. It benefits from the existing interface cache and fine 
grain dependency analysis, and was extended with a persistent interface 
cache, fast code generator, and Linux ELF dynamic libraries support. The 
resulting recompilation time reduction is impressive. For a large graphical 
application, recompiling after a few files were modified took 3.22s instead of 
9.77s. 

In the coming years, faster processors, larger main memory and more 
complex programs may be expected. The average size of each file, and the 
number of files modified between compilations are not expected to change 
significantly however. The net effect would be a gradual reduction of the file 
parsing and code generation time, which currently dominates the recompi- 
lation time at 2.05s out of 3.22s. The smart recompilation and linking time 
accounts for much of the remaining time (.92s out of 1.17s). These may 
be adversely affected by the increasing programs complexity and eventually 
dominate the recompilation time. 

Increases in programs complexity are likely to change more the number 
of libraries imported by each program rather than the size of each program 
file or library. The smart recompilation system could accordingly be further 
optimized in several ways. More information about each imported library 
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could be cached, as applications importing more libraries have much larger 
times for the smart recompilation phase. A tighter integration between the 
program editor and the compilation server could allow the smart recompila- 
tion to perform most of its work incrementally. When a file is opened and 
modified, the compiler could pre-compute which files are affected, and need 
to be recompiled or dependency checked. When a file is saved, the recompi- 
lation of that file and affected files may proceed. Thus, when the last file is 
saved, only that last file, and the files affected, would need recompilation. 

The final recompilation steps, prelinking and linking, involve the complete 
program and may therefore become the dominant step as program complexity 
increases. Interestingly, the Linux ELF linker is surprisingly efficient for 
dynamically linked libraries. Only a small fraction of each library needs to 
be read by the linker. Yet, because of the lazy procedure linking algorithm 
used in ELF, the program startup penalty imposed by dynamic linking is 
small. As may be seen from the results, the hnking time is mostly affected 
by the package size rather than by the number of imported libraries. 

Linking is an I/O intensive process, and the availability of free RAM for 
the I/O buffer cache strongly impacts the elapsed time. This was appar- 
ent for large programs where the compilation server actually degraded the 
performance because of the competition for memory. Assuming sufficient 
physical memory, it may be possible to reduce the linking time by dynam- 
ically loading each object file as it is being recompiled. This would remove 
the linking step which reads all object files from disk, merges the files into an 
executable, and writes the executable to disk, before loading the executable. 

The prelinking step, listed as other, currently uses a small fraction of the 
overall recompilation time. It checks the coherency of opaque types revela- 
tions, determines the modules initialization order, and generates the run time 
type identification data structures. The processing time is proportional to 
the total number of modules in the final program. With increasing program 
complexities, this may in the long term become an important factor. 

In C-|— 1-, run time type identification is a recent addition and is likely to 
bring different problems. Indeed, a declaration appearing in a ./i file may 
be included and compiled several times. The prelinker must ideally remove 
duphcate virtual methods tables and run time type information. The Java 
language does not specify a static initialization order nor does it have opaque 
types to check. The prelinker step is thus mostly avoided. However, since a 
class must be initialized at its first active use, many tests must be inserted at 
run time to initialize classes when they are used, if it is the first time. Thus, 



18 



the savings in the prehnk phase are offset by the run time overhead. 

The fast code generator is now part of the freely redistributable Polytech- 
nique Montreal Modula-3 distribution, originally found at http:/ / m3 . p olymt 1 . ca/ m3 /pkg 
but now hosted at http:/ /www. elegosoft.com/pm3/, and received enthusias- 
tic feedback from users around the world. The compilation server is available 



separately at http : / / www. prof esseurs . poly mtl. ca/ michel . dagenais / pkg / m3server . t ar . gz 
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